Parallelizing XML data-streaming workflows via MapReduce

نویسندگان
چکیده

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Parallelizing XML data-streaming workflows via MapReduce

In prior work it has been shown that the design of scientific workflows can benefit from a collection-oriented modeling paradigm which views scientific workflows as pipelines of XML stream processors. In this paper, we present approaches for exploiting data parallelism in XML processing pipelines through novel compilation strategies to the Map-Reduce framework. Pipelines in our approach consist...

متن کامل

Parallelizing XML Processing Pipelines via MapReduce

We present approaches for exploiting data parallelism in XML processing pipelines through novel compilation strategies to the MapReduce framework. Pipelines in our approach consist of sequences of processing steps that consume XML-structured data and produce, often through calls to “black-box” functions, modified (i.e., updated) XML structures. Our main contributions are a set of strategies for...

متن کامل

Parallelizing Structural Joins to Process Queries over Big XML Data Using MapReduce

Processing XML queries over big XML data using MapReduce has been studied in recent years. However, the existing works focus on partitioning XML documents and distributing XML fragments into different compute nodes. This attempt may introduce high overhead in XML fragment transferring from one node to another during MapReduce execution. Motivated by the structural join based XML query processin...

متن کامل

Parallelizing bioinformatics applications with MapReduce

Current bioinformatics applications require both management of huge amounts of data and heavy computation: fulfilling these requirements calls for simple ways to implement parallel computing. MapReduce is a general-purpose parallelization technology that appears to be particularly well adapted to this task. Here we report on its application, using its open source implementation Hadoop, to two r...

متن کامل

On Parallelizing Streaming Algorithms

We study the complexity of parallelizing streaming algorithms (or equivalently, branching programs). If M(f) denotes the minimum average memory required to compute a function f(x1, x2, . . . , xn) how much memory is required to compute f on k independent streams that arrive in parallel? We show that when the inputs (updates) are sampled independently from some domain X and M(f) = Ω(n), then com...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Journal of Computer and System Sciences

سال: 2010

ISSN: 0022-0000

DOI: 10.1016/j.jcss.2009.11.006